A Survey on Optical Network-on-Chip Architectures

Optical on-chip data transmission enabled by silicon photonics (SiP) is widely considered a key technology to overcome the bandwidth and energy limitations of electrical interconnects. The possibility of integrating optical links into the on-chip communication fabric has opened up a fascinating new research field—Optical Networks-on-Chip (ONoCs)—which has been gaining large interest by the community. SiP devices and materials, however, are still evolving, and dealing with optical data transmission on chip makes designers and researchers face a whole new set of obstacles and challenges. Designing efficient ONoCs is a challenging task and requires a detailed knowledge from on-chip traffic demands and patterns down to the physical layout and implications of integrating both electronic and photonic devices. In this paper, we provide an exhaustive review of recently proposed ONoC architectures, discuss their strengths and weaknesses, and outline active research areas. Moreover, we discuss recent research efforts in key enabling technologies, such as on-chip and adaptive laser sources, automatic synthesis tools, and ring heating techniques, which are essential to enable a widespread commercial adoption of ONoCs in the future.


INTRODUCTION
As the trend toward many-core processors continues to grow, the on-chip communication fabric, commonly implemented as Networks-on-Chip (NoCs), has become a limiting factor regarding performance and power consumption. This is mainly due to inherent technological limitations of electrical interconnects to scale energy and delay at the same rates as transistors. Figures 1 and 2 outline the significant gaps between these two components for shrinking feature sizes. Although signals on wires can be sped up by inserting repeaters, this measure also considerably increases 89:2 S. Werner et al.  [25]. Fig. 2. Energy Scaling [13].
the energy for data transmission. The latency of electrical interconnects is thus limited by the power budget and is likely to prohibit further performance and power scaling of chip multiprocessors (CMPs) by increasing the number of cores. In fact, it is commonly expected that electrical interconnects alone will not be able to satisfy power and performance demands of future manycore systems [86]. Consequently, they have to be augmented, or even replaced, by more advanced interconnect technologies. Optical on-chip interconnects, enabled by breakthroughs in SiP, are considered a promising candidate in this matter since they provide high-bandwidth data transmission through dense wavelength-division multiplexing (DWDM), ultra-low signal propagation of light in silicon, and relatively distance-independent energy consumption. SiP could be implemented monolithically on the same die or on a separate layer through 3D integration-forming the foundation of combining electronic and photonic devices in future chip designs.
To justify a shift away from the well-established electrical interconnect technologies, strong cases have to be made to show that ONoCs are superior in terms of both power consumption and performance. This requires novel ONoC designs that efficiently utilize optical links, advances in SiP devices and materials, and automatic design synthesis tools to assist engineers in the designing process.

Scope
SiP provides intriguing opportunities not only for interconnecting on-chip components but also for bridging the gap between performance gains of higher transistor densities and off-chip memory bandwidth limitations caused by package interconnects limiting the number of pins. With SiP, increasing the off-chip bandwidth is not as constrained by the pin count since DWDM allows multiple wavelengths to share the same link, yielding higher per-pin bandwidth density than electrical interconnects (up to two orders of magnitude [7] [93]). Moreover, going off-chip optically provides considerably higher energy efficiency: optical fibers transmit at 33× lower energy per bit than conventional electrical links across chips [28]. A number of designs with optical chip-to-DRAM and chip-to-chip communication have been proposed [7,28,49,52,111]. SiP has been a transformative technology in wide-area networks and rack-to-rack interconnects and is expanding into the chip-to-chip domain (e.g., rack backplane interconnect). We envision that the trend of using optical interconnects will shortly follow in the on-chip domain and consequently aim to analyze the pertained work by addressing four major objectives: (1) Discuss the current state of the art of on-chip optical data transmission.
(2) Show important design considerations, challenges, and pitfalls of implementing SiP in NoCs in terms of power and performance. (3) Present an exhaustive discussion of recent ONoC proposals, as well as a critical analysis of their strengths and weaknesses. (4) Discuss adaptive laser control mechanisms and automatic design synthesis tools for ONoCs.
To enable the adoption of ONoCs, research in SiP devices is essential, as is research on how to put these devices together in an NoC architecture. Our main focus is on the latter, but since architecture and technologies cannot strictly be decoupled from each other, we start by discussing the main properties of all required devices and their effect on power, energy, and latency. For more detailed information on SiP devices and materials, we refer the reader to [8]. Understanding device implications is necessary to identify limitations, weaknesses, and strengths of an ONoC design. In general, designing ONoCs requires a deep understanding of the underlying SiP technology in order to result in feasible and efficient designs. Moreover, as SiP is a nascent technology with ongoing advances, designers need to be up-to-date with currently available technologies and adapt designs accordingly in order to unleash their full potential. We categorize the efforts of design approaches aiming at an efficient use of optical links in NoCs into the following categories: All-optical NoCs utilize optical data communication only; that is, no electrical links are implemented in the topology. These designs can be simple bus architectures, crossbar designs, or more complex proposals such as wavelength-routed ONoCs where routing is performed optically and the wavelength sharing mechanism is implemented at the backends interfacing the ONoC.
Hybrid NoCs combine both electrical and optical links in their topology and aim to trade them off as efficiently as possible. These could also consist of 3D NoCs and NoCs, adding other disruptive technologies, such as wireless data transmission. NoCs utilizing electrical routers in intermediate stages in the topology for routing purposes are also included in this section.
Next to ONoC architectures, we discuss two areas that received much attention by the community and may have a substantial impact on ONoC design: adaptive laser control mechanisms that allow lasers to be switched on/off and automatic design synthesis tools to assist designers to cope with design complexity, identify design feasibility, increase productivity, and decrease design costs. This survey is useful for a wide audience, from researchers newly entering this field to experts and design engineers who can utilize this document as a reference to the state of the art of optical network-on-chip designs, as well as active research areas. Figure 3 illustrates the SiP building blocks necessary to perform on-chip data communication between a sender and receiver. A laser source (off-chip) provides wavelengths (λ 1 ..λ n ) that are guided into the chip in an optical fiber and coupled into a waveguide-the dual to an electrical wire on-chip. Waveguides can accommodate multiple wavelengths, thereby allowing for parallel data transmission on the same waveguide using DWDM. As depicted in Figure 3, to transmit on n wavelengths, n modulators and filter/detector pairs are needed at the sender and receiver, respectively. The microring (MR) resonator is the main building block for implementing both modulators and receivers by allowing one to selectively filter wavelengths. MR filters are placed close to a waveguide and couple light of one particular wavelength based on its dimensioning. If an optical signal passing through the waveguide gets coupled, the MR is termed to be "in-resonance" with 89:4 S. Werner et al. Fig. 3. Basic optical data transmission [103].

FUNDAMENTALS OF OPTICAL ON-CHIP DATA TRANSMISSION 2.1 Silicon Photonic Building Blocks
this wavelength ("off-resonance" otherwise). In an MR modulator, light of a certain wavelength is captured by the MR where it can be manipulated to encode data (commonly on-/off-keying through either carrier injection or carrier depletion [102]). At the receiver side, photodetectors, which convert photons into electrons, typically respond to a wide spectrum of wavelengths. Therefore, in order to extract data modulated on a certain wavelength, an MR is placed in front to filter/select the correct wavelength. For more details on the technology level and different types of modulators and detectors, we refer the reader to [8].
The implementation of n modulators/filters to perform DWDM is also referred to as "Modulator/ Filter bank." We note that electro-optic broadband MR resonators capable of simultaneously switching a wide range of wavelengths also exist [101]; however, they were shown to be less suitable for on-chip networks (see Section 5.1).
Tiles in CMPs typically contain a processor, low-level caches, and an interface to the NoC. In addition, optical data transmission requires backend circuitry to convert electrical data into the optical domain and vice versa (see Figure 3).
The steps to generate and receive an optical signal are shown in Figure 4. On the sender side, electronic data is first encoded and conditioned for error correction and signal conditioning purposes [8]. Subsequently, data serialization raises the transmission data rate if necessary. The serialization degree depends on the core frequency and modulation speeds. Common modulator data rates are 10Gb/s, and MRs enabling up to 40Gb/s have been demonstrated [37]. Core frequencies in CMPs are normally much lower to attain higher power efficiency, which is why data serialization is usually required. Finally, a specialized driver circuit provides an electrical signal to the optical modulator, modulating bits onto the modulator's optical resonance wavelength.
The modulated wavelengths traverse the waveguide until they are filtered at the receiver. Photodetectors are placed right behind the MR filters and convert optical signals into an electrical current. The electrical currents output by photodetectors are, however, well below the level required to drive voltages for operating digital logic, which is why amplifiers are necessary to regenerate and amplify these currents. The following steps of deserialization and decoding mirror the functionality of the serializer and encoder at the sender side, respectively. Although necessary, these steps do not introduce considerable latency and can typically be executed within one processor clock cycle [55].
In addition to direct optical links, MR filters can also implement switching functionality by filtering wavelengths from one waveguide and "dropping" it to another if they correspond to their resonance wavelength, as illustrated in Figure 5. To enable the steering of the wavelength from  one waveguide to another, the MR has to be placed between these two waveguides accordingly, with the input port at one waveguide and the drop port at the other. In the same fashion, DWDM signals can be dropped by implementing "filter banks" for switching, as shown in Figure 6. This enables one to implement any types of switches and network topologies, and thus constitutes the basis for most of the ONoCs discussed later.

Optical Buses: The Main Architectural Building Block
While Figure 3 illustrates the simple interconnection of one sender-receiver-pair, realistic manycore chips with core counts in the 10s or 100s require more sophisticated communication infrastructures, such as crossbars, buses, or NoCs. Being able to accommodate a number of different wavelengths on one waveguide provides a number of intriguing design opportunities. Figure 7 illustrates the basic optical bus architectures, where each of them has different benefits and tradeoffs.

Basic Optical Buses.
The Single-Writer-Single-Reader (SWSR) bus constitutes a basic optical link between a sender and a receiver. As described earlier, the sender modulates its data packets on n wavelengths in parallel using DWDM.  On a Multiple-Writer-Single-Reader (MWSR) bus, each sender modulates its data on a dedicated, nonoverlapping subset of wavelengths, allowing multiple senders to transmit data simultaneously to a receiver without corrupting each other's data. With n wavelengths provided by the laser source, up to n senders could send data to the receiver, each on its own, unique wavelength. In a crossbar consisting of MWSR buses, each receiver would have its own, designated waveguide.
The Single-Writer-Multiple-Reader (SWMR) bus allows one sender to transmit data on the entire optical bandwidth to all senders attached to the bus. In order to drive the optical signal to the photodetectors of all receivers, sufficient output power has to be provided at the laser source. Although this provides a compact and efficient broadcast network, this also leads to higher laser power.

Control Network Assisted Optical Buses.
While the discussed buses offer intriguing possibilities for ONoC designs, a naive NoC implementation using just these buses was shown to be highly inefficient regarding laser power. SWSR buses merely enable point-to-point connections. Implementing topologies with SWSR buses is thus impractical since it would require a large number of buses, leading to high laser power, as we will discuss in the following section. In MWSR buses, either senders share the available bandwidth, which reduces the bandwidth per sender, or a considerable number of wavelengths have to be provided to offer the same bandwidth as in a SWSR bus to each sender, which also leads to sharp laser power increases. SWMR buses improve power efficiency by allowing efficient data communication to multiple receivers simultaneously, without senders having to share bandwidth between each other (as there is only one); however, in SWMR buses, the laser source has to drive all receivers of the bus simultaneously at all times, thereby requiring more output power to drive all photodetectors at sufficiently low bit error rates (e-15 is acceptable for on-chip communication systems [86]). This increases laser power and may lead to high inefficiencies if too many receivers are attached to a bus.
Two essential bus design proposals tackle these problems, namely, the Multiple-Writer-Multiple-Reader (MWMR) bus [63] and the reservation-assisted SWMR (R-SWMR) bus [89]. Both proposals Fig. 8. Reservation-assisted SWMR bus with x receivers and s number of supported packet lengths. The optical bandwidth required on the reservation bus to allow minimal, one-cycle data modulation of the reservation flit is (log x + log s)/2, assuming modulation speeds of twice the core data rate (e.g., 5GHz and 10Gb/s). take advantage of the possibility of turning on and off MR resonators by using MR tuning to change the ambient temperature and thereby shift the MRs' resonance levels, which can be executed in lower than 500ps [96].
On the MWMR bus (see Figure 7), multiple senders and receivers are connected to the same waveguide and have MRs to send and receive on the entire available optical bandwidth. As simultaneously transmitting nodes would corrupt each other's data, bus arbitration has to be performed prior to data transmission, just as in traditional electrical on-chip buses. After the arbitration, only the nodes that take part in the communication tune in their MR filters, while all other nodes detune theirs to prevent interfering with the data transmission. Although this time-division-multiplexing (TDM) approach imposes latency overheads for bus arbitration, the achieved power savings due to (1) shared number of wavelengths and (2) the number of receivers that have to be driven being always one were shown to be tremendous and the overheads in latency and energy for arbitration small [63].
The R-SWMR bus (see Figure 8) addresses the problem of high laser power in SWMR links introduced by a large number of receivers that the laser source has to drive. The ideal number of receivers for minimal laser power would be one. To ensure that this is the case at all times, the R-SWMR is supplemented with a separate, low-bandwidth SWMR bus on which the sender broadcasts a reservation packet prior to data transmission to inform all connected nodes about the prospective destination. The reservation flit contains the destination address and packet lengths to notify the nodes about the destination and duration of transmission, respectively. Initially, all nodes have their MR filters detuned on the data bus. Upon reception of the reservation flit, the  destination tunes in its MR filters, while all other nodes keep theirs detuned. After data reception, the destination detunes their MRs again.
The latency and power overheads of using a separate SWMR bus prior to data transmission are small. Only very little bandwidth is required on the reservation bus since the reservation flits are small: (loд x ) bits are required to decode the destination ID, and (loд s) bits for encoding the packet lengths, which depends on the number of packet lengths supported by the NoC, which is typically very small (e.g., most CPU architectures have two packet sizes, one for cache line transfers and one for control and coherence traffic [63]). Most ONoCs assume 10Gb/s modulation speeds and 5GHz core clock rate, which would lead to (loд x + loд s)/2 wavelengths required to modulate the reservation flit in one clock cycle.

CRITICAL DESIGN ASPECTS OF OPTICAL NETWORK-ON-CHIP DESIGN
Although optical data transmission offers intriguing opportunities, such as high-bandwidth, lowenergy data transmission and efficient broadcasting, a number of design challenges need to be addressed, both through technological advances and novel NoC architectures, to enable widespread adoption of ONoCs. While technological advances are outside the scope of this article, they have a significant impact on the NoC architecture and may alter design approaches as technology matures. This section outlines the benefits and tradeoffs of optical on-chip data transmission that designers must consider to make efficient use of its high-bandwidth capabilities. As electrical links are the technology to be replaced, we also compare optical to electrical links throughout this section to identify the cases in which they are particularly efficient or inefficient.

Laser Power
Laser power depends on a number of technological parameters, namely, device losses, laser efficiency, photodetector sensitivity, and crosstalk; however, the architecture of an ONoC also has a high impact on the resulting power budget. Designs should choose a topology and layout that minimize optical path losses, efficiently utilize the available number of wavelengths, and assign the number of receivers carefully. In addition, deciding between an on-chip or an off-chip laser source requires a detailed architectural consideration in the ONoC design too.

Optical Path Loss.
Losses of SiP devices degrade the optical signal and require the laser source to provide more output power to drive all receivers at satisfactory bit-error rates. The optical path that introduces the highest insertion loss (IL max ) (i.e., the highest signal degradation) determines the laser output power per wavelength. Loss values of current devices require a very careful design of optical links to avoid excessive laser power. Although devices are constantly evolving, there is no clear roadmap for SiP; however, optical losses are commonly expected to remain a crucial issue and must therefore be addressed rigorously in the NoC design. Figure 9 illustrates the different components that contribute to IL max , while Table 1 lists measurement values Coupling light of an off-chip laser into the chip introduces significant losses, and different coupling methods have been proposed, with lateral [33] and vertical [23] techniques being considered the most promising. For an overview on coupling techniques, we refer the reader to [8].
Filtering/dropping a wavelength ("Ring: Drop" loss) introduces considerable losses, and while it is indispensable at the receiver side, dropping a wavelength to perform switching can be minimized by smart network designs. "Ring: Through" losses are much lower per MR than "Ring: Drop" losses; however, they can become significant depending on the number of MRs passed by an optical signal. For instance, considering the SWMR bus in Figure 7, for wavelength λ n to reach receiver x, it has to pass (n × x) -MR filters. Waveguide bending may be required in the physical layout to route waveguides and also introduces losses. Waveguide crossings are often difficult to avoid in the NoC layout and must also be carefully considered as they incur additional losses and crosstalk. Optical splitters are used to distribute an optical stream over a number of waveguides and introduce splitter losses. This becomes increasingly critical in NoCs where a large number of links exist but only few laser sources can be coupled into the chip due to packaging constraints.

Technological
Implications. Next to device losses, laser efficiency, photodetector sensitivity, and crosstalk also influence the required output laser power and thus received much attention by the community in recent years: Photodetector sensitivity denotes the electrical output per optical input (i.e., the efficiency of converting photons into electrons) and is determined by the device technology. Various different technologies have been proposed in recent years, and Bergman et al. provide a review of different photodetector technologies [8].
Laser wall-plug efficiency (L e ) is the ratio of the optical output power and electrical input power of the laser source [86]. For instance, with 25% L e , 4 watts of electrical power are required to generate 1 watt of optical power. Current laser technologies do not exhibit high efficiencies (at most 30% [34]). Given the significance of L e on laser power, advanced lasers with improved L e have been an active research area.
Crosstalk is caused by undesired mode coupling (both inter-and intrachannel) in silicon waveguides and MRs and degrades optical signals, leading to lower signal-to-noise ratios and in turn requiring more output power at the laser source [8]. Crosstalk between waveguides can be avoided by ensuring sufficient spacing but is more difficult to prevent at waveguide crossings or switching stages. While small at the device level, the accumulated effects of each device on paths in ONoCs can become significant, and numerous studies were dedicated to systematically study the impact of crosstalk in ONoCs [35,79,80]. Apart from the ONoC proposals STMR [62] and CWA [16], none of the proposed ONoC architectures in the literature were explicitly designed with the main  goal of minimizing crosstalk in mind; however, most studies acknowledge that crosstalk increases laser power (modeling tools like DSENT include the impact of crosstalk in the power model [103]) and thus allocate the number of wavelengths per waveguide carefully.

Number of Wavelengths.
The number of wavelengths has a direct impact on the power required at the laser source. To illustrate their dependency, we modeled laser power with DSENT [103] using the (conservative) IL parameters listed in Table 1. We plot the results in Figure 10 for a basic SWSR bus. It is interesting to observe that the relationship between laser power and number of wavelengths is exponential rather than linear. As described in the previous paragraph, this is due to increased optical losses caused by MR through losses (IL max ) and crosstalk noise, which increases along with the number of wavelengths in a waveguide [8].

Number of Readers.
The effects of an increased number of wavelengths and IL max on laser power are further aggravated when more than one reader is attached to an optical bus, as shown in Figure 11. Increasing the number of readers leads to higher laser power as the laser source has to drive more photodetectors and is increasingly critical with higher optical bandwidth. It is therefore desirable to keep the number of readers and wavelengths of an optical bus as low as possible. As these two design aspects also determine available bandwidth, trading off these aspects efficiently is crucial for low-power, low-latency designs.

On-Chip Versus
Off-Chip Laser Sources. The task of achieving optimum power efficiency requires ONoC designers to decide between the use of on-chip or off-chip lasers, which both have their benefits and drawbacks.
Off-Chip Lasers. The main benefit of off-chip lasers is that they offer fairly high device maturity (higher yield), can be tested independently, and achieve higher L e (up to 30% for Gaussian comb lasers [34]) than on-chip lasers. Coupling the external laser into the chip, however, entails drawbacks and challenges. Attaching the fibers of the laser sources to the chip is currently a complex task since single-mode fibers to be coupled have a 1,000× mismatch in the cross-sectional area compared to the on-chip waveguides [8]. Coupling complexity is known to make packaging more costly and challenging. Moreover, coupling losses degrade the optical signal significant, making off-chip lasers effectively less than 30% efficient [32].
On-Chip Lasers. The field of on-chip laser technologies is a fairly young but constantly evolving one. Indium phosphide (InP)-based lasers provide little footprint and high reliability but are only able to output one wavelength, which would require a large amount for DWDM links [19].
VCSELs [115] allow direct modulation at high modulation speeds. However, their operating wavelength is set based on the epitaxial growth; that is, an array of VCSELs is required to satisfy the DWDM demands of ONoCs, which is currently considered an impractical approach [31]. Out of a number of DWDM-compatible multiwavelength lasers (e.g., [58] [71]), Germanium (Ge)-based solutions [15] currently offer the most promising technology as they can be built with standardwidth waveguides and operate at room temperature. O'Connor et al. provide a more exhaustive overview on current laser technologies [86].
The main advantages of on-chip lasers is that they do not need light to be coupled into the chip (no coupling losses and easier packaging), enable batch processing (reduced manufacturing cost), can achieve much higher integration density, and allow for fast switch on/off times, which is the basis for efficient adaptive laser mechanisms (we discuss this in more detail in Section 6) [29] [8]. However, integrating lasers on-chip poses a number of challenges: laser efficiency is typically lower than in off-chip lasers (highest demonstrated device offers 12.2% [61]) and decreases as the ambient temperature on chip rises. Chen et al. address this issue by strategically placing lasers to minimize temperature impact, for example, in a separate layer above L2 caches (less heat dissipation than processors) [20]. Also, integrating a large number of lasers is technologically challenging due to thermal, crosstalk, and placement/layout constraints, which is why efficient sharing methods have been explored [19]. Although under active investigation, on-chip lasers are currently a less mature technology, leading to low manufacturing yield, which in turn increases cost.

Microring Tuning.
MRs respond to one particular wavelength based on their geometry and ambient temperature. MR tuning (or "trimming") is required to mitigate temperature variations and postmanufacturing geometric mismatches, which can cause the resonant wavelength of MRs to shift, resulting in incorrect behavior. Resistive heaters integrated alongside an MR can shift/control the MR's resonant wavelength toward the red through heating or toward the blue through current injection [38]. Since MRs form the basis of most ONoCs, appropriate tuning is necessary to ensure correct network functionality. This section reviews state-of-the-art MR tuning techniques and their impact on recently proposed ONoC architectures.
One can observe a variation in assumptions in recent ONoC proposal in terms of MR tuning power. A large number of recent ONoC proposals estimate the total tuning power by multiplying a fixed assumed tuning power per MR by the total number of MRs in the ONoC (e.g., [90] [54] [42] [113]). These studies assume a temperature range of 20 K and 1μW/K tuning per MR. Other studies assume 16μW/K per MR with a temperature range of 10 K [20]. Instead of shifting an MR's resonant wavelength to its original wavelength channel, Georgas et al. propose to shift it merely to the next closest channel and perform a bit-reordering method to maintain correct functionality [38]. Athermal MR devices capable of maintaining correct functionality in the face of temperature variations have also been demonstrated [41], most recently even with CMOScompatible fabrication processes [36]. Although currently exhibiting a fairly large footprint (25μm radius [36]), advances in these devices are very exciting as they eliminate MR heating altogether.
Nitta et al. [84] exposed that tuning using current injection is highly sensitive to MR count, and instability and thermal runaways can happen easily. Utilizing heating only is thus considered the more practical approach [30] and has been assumed in previous studies [47]. Without a mechanism to tune MRs back toward the blue, MRs must be designed to operate at temperatures higher than the electronic layer could ever raise them. Parka [30] proposes to place an insulation layer in a 3D stacked chip to isolate the MRs from the heat dissipated on the electrical layer. This simple approach shows highly promising results as it can lower the heating power per MR by ∼4× and ∼5× for two different die cooling techniques. Although the vast majority of recent publications on ONoCs assume a fixed value for tuning per MR, studies have shown that determining MR tuning power is actually more complex than assuming a perfectly linear relationship between MR count and tuning power, and is significantly impacted by die area, ambient temperature of the chip, and the rate at which heat can be transferred outside the chip [84]. In the absence of fully integrated power/thermal simulation environments, however, assuming a fixed per-MR tuning power is considered to provide reasonable estimates [84]. Besides, we note that technological aspects, such as the thermal conductivity of the material surrounding the MR and its thermal tuning coefficient, influence MR tuning power too. For more details on MR tuning technologies, we refer the reader to [8] and [102].

Impact on ONoC
Architectures. Although the total MR tuning power depends on a number of factors on the technology level, ONoC designers can help reduce MR tuning power by designing sophisticated ONoC designs that aim to minimize the number of MRs while maintaining network bandwidth. In this section, we provide the reader a feeling for the impact MR tuning power has on the total power consumption. Table 2 lists MR tuning and laser power values for a number of recently proposed hybrid (electrical and optical links) and all-optical NoCs for 64 nodes, along with the number of MRs of each design (as reported in the corresponding publications). We list the MR tuning power values for 20μW/MR and 5μW/MR (a speculative value possibly attained in the future based on the 4× tuning reduction of an insulation layer [30]). In the last two columns, we calculated the ratio that MR tuning power and laser power have on the total optical power in the considered designs. For instance, in Corona, MR tuning power contributes 20% to the total optical power for 5μW/MR, and laser power 80%. In this particular case, laser power consumes the majority of power. We believe this assessment is useful to identify the potential of possibly attainable power reductions by advanced technologies and to put them into relation to electrical NoCs.
The number of MRs is in the tens of thousands for most designs. Hybrid NoCs generally require fewer MRs as their topologies are supported by electrical links (apart from Atac, which has a global crossbar). These numbers are necessary to provide similar, or even superior throughput levels than electrical NoCs for 64 nodes. As the number of on-chip cores is expected to exceed 64, the requirements on the MR count will further increase. For 20μW/MR, MR tuning power plays a larger role than laser power in most designs (∼60%). This tendency, however, shifts tremendously for 5μW/MR, where laser power consumes the vast majority of the total optical power. At this point, novel technologies such as adaptive and efficient on-chip lasers and synthesis tools minimizing IL max will become significant to keep up with the advances in MR tuning power.
In order to give a perspective to the power values listed in Table 2, we estimated the power consumption of a 64-node electrical mesh with DSENT [103]. The total power of a 2D mesh at moderate injection rates (∼1Tbps) is 0.28W for 11nm, 0.75W for 22nm, and 4.5W for 45nm technology library (with synthetic random traffic). Some of the hybrid NoC proposals already beat these values, for example, Meteor (0.235W). It can therefore be expected that supplementing electrical NoCs efficiently with optical links can indeed be implemented at lower power budgets. For all-optical NoCs, it can be observed that advancements regarding laser power will be required to be able to compete with electrical NoCs at 11nm or 22nm; however, as we will describe in the following sections, adaptive laser sources and synthesis tools can decrease laser power by up to 90% each, which would bring ONoCs closer to reality. We note that dynamic power-typically a minor contributor to power in ONoCs-is not listed in Table 2 for clarity.
The majority of recently proposed designs provide studies with NoCs up to 64 nodes. If larger network sizes are considered (e.g., Atac or Firefly), clustering is utilized as a means to scale the network topology. From this perspective, it would be intriguing to conduct scalability studies of ONoCs since the lack of scalability is one of the main issues in electrical NoCs. From a latency perspective, ONoCs are ideal to perform long-distance communications; however, longer paths lead to higher IL max , and the number of nodes attached to a waveguide should be kept low to keep laser power manageable. The benefits and tradeoffs of SiPs for many-core systems of larger scales would thus be worth investigating.

Energy Consumption
Global electrical wires have become increasingly energy hungry in CMPs [2] as they require repeaters, regenerators, or buffers to provide satisfactory signal integrity and latency, with increasing energy consumption for longer link lengths. Although leakage currents (i.e., static power) have gained importance due to shrinking transistor sizes in recent technology nodes, dynamic power still plays a dominant role [103] (opposite to optics where static power dominates). The energy required to transmit bits has thus become a crucial factor in electrical NoC design, and the low, distance-independent energy properties of optical data transmission an interesting alternative. Figure 12 shows the difference in energy per 64-bit flit over an electrical and optical link with increasing link length, modeled in DSENT with a 22nm technology. For short distances, the electrical link is more energy efficient as it does not require E/O and O/E conversions. However, for link length greater than 1mm, the almost distance-independent energy consumption of optical data transmission dominates electrical links. From an energy perspective, it is therefore beneficial to utilize electrical links for destinations less than ∼1 mm. For instance, in a 64-core chip, tile widths/lengths are often between 1 and 2mm for common die sizes of 225mm 2 (e.g., [11]). This would mean that only communication to direct neighbors should be electrical. A trend toward increasing core counts and die sizes would therefore make optical data transmission increasingly superior in terms of energy/dynamic power, particularly for communication between nodes at large distances. In addition, router traversal of a 64-bit flit in 22nm at 5GHz requires ∼2pJ, which is similar to the energy needed to traverse an electrical link of 1.3mm-further emphasizing the significance electrical links have on total energy consumption of NoCs.    Optical interconnects offer superior dynamic power consumption compared to electrical interconnects thanks to its low, distance-independent energy consumption; however, static power (MR tuning and laser) dominates the power budget of optical links, which is why many studies discussed in this survey have aimed at maximizing link utilization in ONoCs (e.g., see Section 4.3).

Latency
Electrical signal propagation takes 131ps/mm in an optimally repeated wire at 22nm [44]. At 5GHz, one hop over an electrical link in an NoC is therefore commonly accepted to take one clock cycle (we note that this is also subject to clock frequency, layout, final link lengths, etc.). Optical links, on the other hand, require E/O and O/E conversions and signal propagation delay in the waveguide (t pr op ), which take at least one clock cycle each (three cycles in total). Signal propagation of light in silicon waveguides, however, has been identified to be 10.45ps/mm (based on models utilizing ITRS predictions [43]), which is particularly beneficial for long-distance communication, especially because optical links, as opposed to electrical links, neither require pipelining to drive and/or speed up the signal nor introduce further distance-dependent latencies. Figure 13 plots the transmission delay on an optical link versus the link length. Latency includes, next to waveguide propagation delay, delay for the E/O backend (9.5ps), modulator (3.1ps), detector (0.22ps), and O/E backend (4.0ps) [22,45]. We observe that distances/link lengths on-chip have very little impact on the overall latency of optical links. The major contributor, in fact, is data modulation, that is, the time it takes to serialize a packet based on the available bandwidth and link data rate. This is outlined in Figure 14, which lists the impact on the delay of different packet sizes common in the on-chip domain, with different number of wavelengths typically required in ONoCs, assuming link propagation delay and delay through the backend circuitries of one cycle for simplicity, a link data rate of 10Gb/s (modulators/detectors), and 5GHz core clock frequency. With this configuration, one wavelength (λ) can modulate 2 bits in one core clock cycle-leading to one core clock cycle modulation delay of a 64-bit flit with 32λ. These values are an important guideline in order to ideally trade off power required for increasing the number of λs (see previous section) and latency. For instance, increasing link bandwidth from 16λ to 32λ decreases latency only by one clock cycle, but more than doubles laser power (see Figure 10). Bandwidths lower than 8λ introduce too much latency for too little power benefits. This illustrates that designers should always carefully balance optical bandwidth and laser power.
To result in minimum packet latencies, these delays must be carefully compared to the electrical delay. Although electrical links do not need E/O and O/E conversions, the only energy-efficient way of reaching distant cores is through several hops in a topology, which introduces router delay and contention in the NoC. Router traversal delay depends on the clock frequency, where high clock frequencies of 5GHz may need up to five pipeline stages (e.g., Intel's TeraFLOPS design [107]). If we assume aggressively pipelined routers that can be traversed in two clock cycles (assuming enough link bandwidth), one hop would take three cycles. While this delay adds up for each additional hop to reach a destination, hardly any delay is added on optical links when the distance increases (assuming direct connections). Optical links are thus increasingly superior in terms of latency and energy for large distances.

Physical Layout
Integrating SiP for on-chip optical data transmission in the near future will likely be performed through 3D integration, with optical components placed on separate optical layers and interfaced by the electrical components through through-silicon vias (TSVs), as depicted in Figure 15. Although monolithic integration has been demonstrated, accommodating both SiP and electronic devices on the same die is more challenging due to the physical interactions of these devices and the different effects temperature variations have on them [8]. The physical implementation of SiP is a challenging task that underlies a number of constraints. Engineers implementing optical links need to be aware of the state of the art of device technologies, their impact on power, packaging constraints, and feasibility.
Manufacturing SiP is still a niche market and usually imposes tight constraints that designers need to be aware of. As discussed in Section 3.1.5, one of the main contributors of the costs for chip packaging is laser source coupling. Therefore, the number of laser sources available is usually limited, which has to be considered by designers in the NoC topology. For instance, designing topologies that require a large number of links that need to be provided with a separate set of wavelengths leads to large amounts of splitting, and thus higher insertion loss. Also, based on the coupling point of the laser and NoC layout, potentially large distances must be traversed on chip to provide a waveguide with light. Moreover, the more laser sources are required by the system, the higher the cost. It is therefore essential to design efficiency that a detailed analysis of the physical implementation is conducted when proposing a (logical) topology.
The placement and spacing of SiP components is important to mitigate crosstalk noise. Although advances in materials and device technologies lead to compact SiP components (e.g., MR diameters of 5μm [8]), the spacing required between these components makes their placement and layout nontrivial. If every tile has to be provided with modulators and receivers for optical communication, sufficient space must be available to place and interface these devices. Recent work assumes 5μm clearance between MRs [63], which ultimately limits the number of MRs that can be provided to each tile with common tile dimensions of 1 to 2mm. As mentioned before, integrating optical components on-chip is widely envisioned to be implemented by placing them on a separate layer using 3D integration; however, while this decreases the interferences between the SiP and CMOS components, the spacing limitations still exist as the optical components have to be interfaced accordingly with TSVs.
Apart from spacing issues, routing waveguides should avoid excessive waveguide crossings since they increase IL max . Recent studies have compared a number of different ONoC designs and revealed, however, that minimizing waveguide crossings in the layout can lead to longer waveguide lengths and in turn propagation losses [98]. In addition, it is important to study how logical topologies can be mapped to a physical layout as the number of unavoidable waveguide crossings also depends on the topology. Finding the ideal physical layout also depends on the utilized device technologies, in particular their loss values: for instance, for technologies with high losses for waveguide crossings and low waveguide propagation losses, it may be more efficient to implement longer waveguides if that allows one to minimize the number of waveguide crossings. Given the young age of ONoCs, many past proposals required designers to find an efficient/ideal physical layout for a given topology manually. This led to a rise in numerous automatic synthesis and layout tools that assist designers to explore the design space, to minimize IL max , and in turn to significantly improve ONoC designs [12], which will be discussed in more detail in Section 7.

Reliability
With the susceptibility of SiP to temperature and process variations, coupled with the low maturity level of SiP manufacturing processes (compared to standard CMOS), reliability is a crucial concern for widespread adoption. Correct switching and filtering of wavelengths on optical paths requires MRs to respond to the correct resonance wavelengths, which in turn depends on according dimensioning of the MRs. Fabrication mismatches can thus result in performance degradation or complete system failure. A number of studies have therefore been dedicated to analyze, formalize, and model the frequency of occurrence and extend the effect that process variations can have, ranging from detailed studies on the device-level to system-level analyses [66,[81][82][83]. In addition, reliability-aware design flows for SiP have been proposed [75].
However, only a few ONoC architectures in the literature-the focus of this article-explicitly support fault tolerance. Nitta et al. [85] evaluate different schemes for improving reliability (e.g., retransmission versus error correction) for optical links based on MRs by injecting varying fault rates. They conclude that error detection schemes will almost certainly be needed in large-scale NoCs with MRs. R-3PO [78] is a 3D ONoC that offers a dynamic reconfiguration mechanism capable of reallocating bandwidth around faulty channels to maintain correct operation and graceful performance degradation in the face of faulty links. Aurora [67] is a thermally resilient ONoC that handles thermal variations on different system layers, from varying the bias current through 89:17 Fig. 16. Folded crossbar [98]. Fig. 17. Snake [98].
MRs for small temperature variations to rerouting messages away from hot regions for hightemperature variations.
Reliability is an issue that is here to stay considering the current state of device technologies. Moreover, since SiP is a nascent technology with low manufacturing volumes, fabricating chips with SiP requires a substantial investment and according measures must be taken to maximize yield [78].

ALL-OPTICAL NETWORKS-ON-CHIP
All-optical NoCs exclusively utilize optical links to implement the communication between all tiles on the chip. Design proposals in the literature revolve around the question of how to deliver good performance while keeping static power overheads and optical resource requirements low-b oth critical aspects to obtain higher efficiency than electrical NoCs. A number of different approaches have been proposed, from early first steps of simple bus designs [51,92] to sophisticated designs utilizing wavelength sharing, bus arbitration, and TDM, which will be discussed in this section.

Wavelength-Routed Optical NoCs
Wavelength-routed optical NoCs (WRONoCs) route optical signals through the network based on their wavelength. To avoid data corruption of two optical signals on the same wavelength, according MR filters for switching have to be implemented to provide collision-free paths between any source-destination pair. This section will deal with the first, basic design proposals and their advantages and disadvantages. A number of different design approaches have been implemented on top of these basic designs to tackle their inherent weaknesses through TDM or wavelength sharing, which also belong to the realm of WRONoCs. These will be discussed in the following sections.
Collision-free, all-to-all communication requires an optical crossbar. This could simply be implemented by using SWSR, SWMR, or MWSR buses with one bus assigned to each sender (SWSR, SWMR) or receiver (MWSR); however, this scales the waveguide count linearly with the number of nodes, causes large numbers of MRs, and overprovisions optical bandwidth, which leads to unacceptable power overheads. WRONoCs that implement switches using MR filters alleviate these issues as optical bandwidth and waveguides can be reused by efficient topologies.Several WRONoC topologies have been proposed, such as the λ-Router [14], Snake, or Folded Crossbar [98]. Figures 16 and 17 illustrate the latter two designs (redrawn from [98]). MR filters are placed as shown in the figures to drop wavelengths so that the optical signals are forwarded to the destinations. Based on the destination, senders will choose the according wavelength assigned to the destination for data modulation. The number of wavelengths thus equals the number of destinations in the network and every sender needs to have modulators to be able to address each destination.
Moreover, more than one wavelength is required to provide the NoC with sufficient throughput; therefore, a set of wavelengths (λ-set) is assigned to each node for data transmission. The main limitation of these approaches is the quadratic scaling of MR filters to provide the contention-less switching of the crossbar, which induces large overheads in MR tuning power.
A more sophisticated WRONoC for 3D mesh-based ONoCs has been proposed that allows for a regular topology and lightweight, nonblocking optical routers that enable dimension-order routing as known from electrical NoCs [119]; however, MR requirements are still large, and relying on a high amount of wavelength switching considerably contributes to the overall path losses and in turn laser power. Aurora [67] provides a mesh structure but suffers from similar power overheads.
Recent work studied whether topologies that use switching elements (i.e., MR filters) to perform routing are necessary, or whether ring topologies that rely on spatial division multiplexing (i.e., communication spread across several different waveguides) and require very few waveguide crossings can provide higher efficiency [98]. Their results reveal that optical ring topologies, although simpler, offer poor scalability with regard to waveguide lengths (and in turn propagation losses) and have high connectivity requirements, which leads to higher power requirements, especially as the number of nodes increases. Ring topologies should therefore be used with caution.
All in all, these ONoC designs lead to very limited scalability due to high MR tuning power, which renders these approaches infeasible for a larger number of cores. This has been identified by the research community and has been tackled by a number of proposals that utilize different forms of resource sharing, as we will discuss now.

Control-Network-Based WRONoCs
Laser and MR tuning power directly depend on the number of wavelengths (more laser sources, higher losses) and MRs (more per-MR tuning) in the network. WRONoCs, discussed in the previous section, have demands in both of these metrics with inefficient scalability as they provide all-to-all contention-free paths through MR switches only and need one dedicated wavelength set for each destination. A number of proposals identified this problem and proposed the use of a separate control network on which destinations have to be reserved before a sender can start data transmission [42,53,54,112,114]. This is performed by exchanging request (REQ) and acknowledgment (ACK) packets between senders and receivers and allows collision-free operation without the need for providing all-to-all collision-free paths through switching since senders can share bandwidth and waveguides to transmit data. Communication without data collision is guaranteed because only one sender will send data on a λ-set to a destination at any given time. Results show that this technique reduces the number of MRs for switching dramatically.
Based on this technique, a number of studies further improved these types of WRONoCs by decreasing the number of λ-sets necessary for addressing destinations, effectively reducing the number of wavelengths in the NoC and in turn alleviating laser power requirements. CoNoC [54] uses N /2 λ-sets for addressing, that is, two nodes share the same λ-set address. As this can lead to situations in which two sender-receiver pairs communicate on the same λ-set ("collision"), routing paths through the NoC must be provided to avoid data corruption of two simultaneously transmitted signals. QuT [42] and Amon [112,114] further improve CoNoC by proposing collision-free topologies and routing algorithms that allow four destinations to share the same λ-set for addressing, thereby reducing the number of λ-sets to N /4. Amon reports the lowest number of MRs necessary to enable these designs. Figures 18 and 19 show the topologies of Amon and QuT and wavelength-routing examples that provide collision-free paths when two senders modulate on the same λ-sets to transmit to destinations that share the same λ-set. This illustrates how MRs can be utilized to design sophisticated ONoC architectures with improved efficiency.  [42]. Fig. 19. Amon [54]. The control network (CN) enables this large reduction of MRs but also poses a performance bottleneck in these NoCs and requires additional resources. Since the CN must provide contentionfree communication, efficient CN design is important. The discussed studies show that REQ/ACK packets are small, and thus only little bandwidth/wavelengths are required to enable fast communication. The main overhead was identified to be the number of waveguides to enable the contention-free crossbar [42]. QuT thus proposed an improved CN that utilizes splitters to distribute optical signals over a certain number of waveguides, as shown in Figure 20. On each waveguide, each node still has its own distinct wavelength set to send to destinations; however, there are only N/8 waveguides, which reduces the number of waveguides and MR filters. Signals on each waveguide are split into N/8 destinations. Therefore, in addition to the packet type, senders also need to encode the destination ID so that destinations know whether a packet was meant for them or not. This increases the control packet size; however, the benefits of fewer waveguides was shown to considerably outweigh this drawback.

Maximizing Bandwidth Utilization
Rather than saving wavelengths in the network by reusing them for addressing, a number of approaches let nodes share wavelengths directly on the bus by utilizing an arbitration mechanism to implement a TDM approach. This section discusses the different arbitration mechanisms and bandwidth sharing approaches that have been evaluated in recent work.

Arbitration Schemes for Channel Sharing Approaches.
Token ring arbitration has been evaluated by a number of proposals [78,90,108,109]. Corona [108] has an optical crossbar on which a node is permitted to send data by contending for the bus on an optical token ring arbitration network, implemented as an MWSR bus. This study is further deepened through a detailed analysis of a channel-based and slot-based distributed token-based arbitration, and their suitability for optics [109]. Their schemes vary priorities dynamically to ensure fairness. In FlexiShare [90], a reduced number of channels are globally shared by decoupling the allocation of the channels and buffers, leading to slight additional power and area overheads. Their token stream mechanism for channel arbitration and credit distribution, however, is highly efficient as it halves the amount of utilized channels compared to a conventional crossbar under balanced, distributed traffic. R-3PO [78] utilizes a token-based control network to handle accesses in a 3D NoC with an optical crossbar. Generally, the main issue with token ring arbitration is that it leads to an increase in latency with an increasing number of nodes due to longer waiting times for receiving a token. Therefore, a number of alternative arbitration schemes have been proposed to alleviate this problem: Featherweight [91] is a lightweight arbitration scheme with QoS support that implements a feedback-controlled, adaptive source throttling scheme to asymptotically approach weighted maxmin fairness among all nodes. Featherweight provides up to 87% power reductions while providing negligible throughput loss.
In "Channel Borrowing" [118], each channel is allocated to an owner node, but can also be utilized by a few other nodes during idle time. Each node has a statically assigned channel to avoid starvation and can borrow an additional idle channel to boost bandwidth and improve network utilization. They propose a selection policy for choosing a channel to be borrowed that enables low probability of conflict, and a distributed arbitration mechanism to resolve contention of multiple nodes wishing to borrow the same channel.
GASOLIN [69] proposes pipelined distributed global arbitration for MWMR crossbars that allows one to parallelize the arbitration process and simplifies arbiter design. A distributed arbiter is implemented at each node, which share global request information to identify free channels and maximize bus utilization. Compared to token-based arbitration, GASOLIN reduces the number of channels by 50%.
SUOR [116] implements a bidirectional ring waveguide that is divided into multiple nonoverlapping sections that can be utilized independently, thereby supporting multiple transactions simultaneously. Their hybrid control network consists of agents-one for each node-through which nodes can access the ring waveguide. Agents communicate with processing nodes optically with low delay and share information with each other over short electrical wires.
LumiNOC [63] implements a shared optical bus on which wavelengths are used for both data transmission and arbitration, thereby saving the overheads of a separate arbitration network. Their buses have an arbitration phase prior to the data transmission phase and form a double-back waveguide on which sending nodes will also receive the packet they were transmitting, which is important in their arbitration phase: all nodes are synchronized at the beginning of the arbitration phase, and every node in the bus is assigned to one unique subset of wavelengths and modulates an arbitration flag containing the destination address, source address, and packet size indicator on every other node's wavelength set. All packet fields are 1-hot encoded; that is, for instance, when a node wants to use the bus, it sets the bit corresponding to its address to "1" in the source address field. By the time all arbitration flags have been received by all nodes, the source address field will be analyzed to see whether there is only one node that wants to use the bus or whether there has been a corruption (i.e., >1 bit is set to "1" in the source address field). In case there is only one sender, each node knows who the destination is (encoded in the destination address field) and the packet length from which they can infer the duration of data transmission. Nodes will use this information to either (1) detune their MR filters if they are not a receiver or (2) tune in their MR filters for the according duration of transmission. If data corruption occurred, all contending nodes enter a dynamic scheduling phase in which all senders are scheduled sequentially on the entire optical bandwidth. After data transmission, all nodes tune in their MR filters and enter the arbitration phase again. Evaluation results show that this approach leads to large power savings and good throughput on the optical buses.
"Wavelength Stealing" [124] enables opportunistic channel sharing without incurring any arbitration overheads by implementing collision recovery. Similarly to "Channel Borrowing," each node has one dedicated channel to each destination on which service is always guaranteed, essentially implementing a point-to-point (P2P) network. In addition, senders can steal access to channels owned by other senders to that same destination they want to transmit data to, enabled by placing additional modulator MRs along the shared waveguides. Service on the stolen channels is not guaranteed and is performed arbitration-free: owners of the channels are not notified about the "theft," and collisions that arise from it are corrected at the destination node using erasure coding. [60] allows the bandwidth for communication between cores to be adapted according to the communication requirements. This is controlled at each core by an optical network interface (ONI), which is configured at runtime to open up dedicated P2P communication channels between two cores. Configuration of ONIs is controlled by an electrical control network. Their configuration mechanism allows one to reuse wavelengths to realize several independent communications on a single waveguide, allowing for high flexibility at the cost of electrical overheads of the control network and ONIs.

Adaptive Bandwidth Scaling. Chameleon
CLAP-NET [50] implements P2P links in the network topology whose bandwidth is shared based on the utilization rate of the individual cores accessing them. A bandwidth allocation algorithm determines the wavelength assignment to requesting nodes within a reconfiguration period, after which modulators/MR filters are tuned to implement the newly calculated bandwidth allocation. In this algorithm, every core initially has a certain set of wavelengths, from which at most 90% can be reallocated to other cores to avoid complete starvation. Information regarding the current network status (i.e., statistic about the utilization rate of the cores) is relayed on a separate optical control network.
PROBE [123] performs prediction-based optical bandwidth scaling by utilizing a history-based prediction scheme based on past link utilization. Laser sources are split over multiple waveguides in the topology, and bandwidth on each waveguide is tuned according to the past network traffic to meet performance requirements by dynamically shutting down portions of the NoC. Their bandwidth control mechanism sets the bandwidth globally while collecting resource utilization of cores and tuning MRs locally.
ColdBus [95] utilizes a novel method for predicting traffic using program counter addresses rather than past network activity. To deal with mispredictions, they implement this method in a predictor block that is interfaced by separate waveguides.
In addition to the discussed approaches, we note that although being a study focusing on radiofrequency interconnects rather than optics, a bandwidth utilization approach can be applied to shared optical buses and might provide large energy benefits [117].

WRONoC: Optical Crossbars.
All-optical NoCs based on global crossbars require large numbers of MRs and thus consume considerable MR tuning power. Laser power requirements are also too large due to excessive numbers of required wavelengths. Moreover, MRs have a fairly large footprint compared to its electronic counterparts, leading to higher manufacturing costs. To enable the adoption of all-optical NoCs, designers have to provide NoC designs that efficiently utilize the available optical resources. Nonarbitrated WRONoCs seem to fail at satisfying these demands, especially with increasing network sizes.

Control-Network-Based WRONoCs.
Approaches utilizing an underlying control network to decrease the number of MRs and reuse wavelengths for addressing are a step in the right direction. Recent research efforts managed to tremendously reduce the number of MRs and wavelengths compared to basic WRONoCs, making them more competitive to their electrical counterparts. Simulation results of the discussed proposals (see Section 4.2) show constant average packet latencies across all traffic patterns, apart from Hotspot traffic, which is a pathological case due to the destination checking required prior to data transmission. This constant behavior can be explained by both the transmission mechanism and the propagation delay of light: senders will always perform the same mechanism for data transmission, that is, destination checking plus the actual data transmission upon reception of an ACK. With a 10.45ps/mm propagation delay of optical signals in silicon [43], large distances can be covered within one clock cycle at 5GHz (200ps clock cycle). Current processor technologies do not tend to exceed 5GHz due to power dissipation constraints [26,46], and common tile dimensions of ∼1 to 2mm in width/length (e.g., [11]) imply that most destinations in these NoCs can be reached within one cycle for core counts less than 100. The actual traffic pattern therefore has a less severe effect on latency compared to electrical NoCs; however, the underlying control network constitutes a performance bottleneck, particularly for traffic patterns where some cores are accessed disproportionately often, which is the case in some multithreaded workloads [10]. In addition, all of these networks assume 8λ for data modulation at each node, providing 16-bits/cycle bandwidth for 10Gb/s modulators and 5GHz core frequency. While this provides very competitive latency for data packets of smaller sizes, such as coherence traffic, for larger data packets, such as cache line transfers, this may impose considerable latencies. Increasing the bandwidth from 8λ to, for instance, 16λ improves latency for large packets; however, as this bandwidth would have to be provided at each node, this would increase the number of MRs and, in turn, MR tuning power. Therefore, these designs may provide too little flexibility in terms of bandwidth assignment to be an efficient solution for applications with high bandwidth demands.

Adaptive Bandwidth Scaling and Sharing.
Given that static power consumes the vast majority of power in ONoCs, the objective of maximizing the utilization of the available resources is essential for efficient designs. For many multithreaded applications, traffic patterns are often irregular and contain phases of very low and very high utilization. Therefore, dynamic allocation of optical bandwidth can lead to maximum throughput for a given power budget. Token-based arbitration approaches to resolve contention has the inherent problem of limited scalability as the latency for receiving a token increases along with the network size. More sophisticated arbitration techniques were discussed, with different tradeoffs regarding arbitration latency and overheads in the control network. Trading off degree of flexibility regarding bandwidth allocation and arbitration complexity has been evaluated, and distributed and centralized approaches have been compared. Tuning and detuning of MRs enables many design opportunities to efficiently utilize the available wavelengths. While, for instance, LumiNOC schedules nodes contending for the bus sequentially on the entire bandwidth, a more fine-grained approach could be chosen where not only time slots but also subchannels could be assigned, which would further improve throughput on these buses. Previous analyses indicate that both slot-and channel-based arbitration leads to ideal throughput and fairness on the bus [109].
While shared optical buses make maximum use of the available wavelengths, they may suffer from considerable ring-through losses: in a shared optical bus, a larger number of MRs are passed by the optical signal, which introduces ring-through losses even when they are detuned. For instance, when 16 nodes are connected to one shared bus and each of them has 64 modulators for data transmission, that makes 15 × 64 = 960 MRs to be passed (assuming data is filtered before a node's own modulators). While 0.01dB MR through loss has been demonstrated, many recent studies utilize device predictions of 0.001dB or 0.0001dB, which can have a tremendous impact on the total ring-through loss in this case, IL max (9.6dB vs. 0.96dB vs. 0.096dB). Simulations using DSENT [103] show that on a MWMR bus with 16 nodes and 64 modulators each, the laser power increases by 5× when using 0.01dB as opposed to 0.001dB. Bandwidth scaling and sharing degree should thus be considered carefully.

Summary.
To the best of our knowledge, there have only been very limited comparisons between all discussed approaches-crossbar-based WRONoCs, control-network-based WRONoCs, dynamical bandwidth assignment, and maximizing channel utilization. For this reason, an exhaustive study comparing these different approaches under realistic traffic patterns could further guide designers on what the most efficient solutions are. Our analysis suggests that traffic-aware shared optical buses are likely to be the most efficient approach for a number of reasons: first, overly complex WRONoCs that implement switching functionality for collision-free paths will probably make the floorplanning and physical layout challenging. In ONoCs, this might lead to much higher optical losses and, in turn, power budgets than initially thought. Sticking to a simple topology with a clear notion on where laser sources can be placed, as well as other optical components, is essential. Moreover, these topologies lead to a larger number of waveguides that all have to be provided with light. The number of laser sources that can be coupled into the chip, however, is limited, and large amounts of splitting lead to high losses. Therefore, although wavelengths are being reused, simple shared optical buses that can enable data communication between a number of wavelengths are more efficient in that respect.

HYBRID NETWORKS-ON-CHIP
While all-optical NoCs are of high interest and can indeed become the preferred paradigm in the future, they currently suffer from fairly high static power overheads. With current device technologies, it is believed that hybrid NoCs leveraging both electrical and optical interconnects will allow one to balance the benefits and drawbacks of each technology to reach a sweet spot in terms of latency and power consumption. This, however, requires a detailed knowledge of the implications of implementing electrical and optical links regarding these metrics.

NoCs Based on Electro-Optic Broadband Ring Resonators
The vast majority of research in recent years focused on ONoCs based on passive MR resonators, as do the sections of the article at hand. However, a number of early designs [1,17,18,40,101,106] proposed designs based on electro-optic broadband MR resonators. Rather than just responding to one wavelength like passive MR resonators, broadband MRs guide a large set of parallel wavelengths along an optical path. ONoCs based on broadband MRs utilize a circuit-switching approach in which first a path through the optical network has to be established before data can be transmitted. The ONoC implementing the broadband MRs is used for bulk message transmission and is combined with an electrical packet-switched network for control and short message exchange [101]. The electrical NoC also controls the path setup and release functionality required by the optical network. Latency on the electrical NoC, tuning of the electro-optic broadband MR resonators, and the path reservation mechanisms considerably affect the latency and power of the entire NoC. Transferring large data messages can mitigate the cost of path reservation [3]; however, in the on-chip domain, common packet sizes are small (64 bits for control packets and 576 bits for cache line transfers [63]).  [4]. Fig. 22. Atac [55]. Fig. 23. Firefly [89].
Although broadband MR resonators may be efficient for interchip communication, ONoCs based on passive MR resonators are commonly accepted to be more suitable for on-chip networks as they do not require a circuit-switched approach and have a smaller footprint. Apart from the papers listed earlier, the main focus of the community has thus been dedicated to propose designs based on passive MR resonators.

Combining Electrical and Optical Links in the Network Topology
Many hybrid NoC proposals revolve around the idea of implementing both an electrical and optical network and utilize them in a distance-based fashion in which electrical links are utilized for short distances and optical links for long distances. To do this efficiently, a number of nodes are grouped into clusters. Intracluster communication is executed over an electrical network, whereas intercluster communication requires the sender to first send the packet over the optical network to the cluster in which the destination resides. Once the destination cluster is reached, the local electrical network is used to forward the packet to its destination within the cluster.
Meteor [4] utilizes this approach and evaluates such an NoC with varying cluster sizes. Based on their results, the most efficient way is to cluster 16 nodes (4 × 4 submeshes) for a 64-node NoC with an 8 × 8 layout. This is illustrated in Figure 21. Gateway routers for accessing the optical NoC for intercluster communication constitute the access points of each cluster to the optical network, which consists of four MWMR buses of 64λ bandwidth each. Each gateway router has access to all four of these buses.
Atac [55](see Figure 22) proposes both a 64-node and 1,024-node version. In the former, the cluster size is one (i.e., each node has access to the optical network), which is a global crossbar with 32 SWMR buses of 64λ bandwidth each. Rather than dealing with fixed clusters, nodes transmit data on the optical network if the destination is farther than four hops away. Otherwise, the electrical NoC is used. Providing all-to-all global optical communication, however, is highly inefficient due to large static power overheads. Atac's 1,024-node version comes closer to the previous approach as it utilizes the same network as in the 64-node case, but clusters 16 nodes at each access point. Their original proposal performs intracluster communication over a 2D electrical mesh; however, they refine this in a follow-up study by replacing the mesh with a more efficient star network [56].
Firefly [89], as shown in Figure 23 for 64 nodes, concentrates four cores at each router and defines fixed clusters containing four routers each. Intracluster communication is executed over an electrical 2D mesh. Each router in a cluster has a dual in every other cluster, with which they are connected optically (e.g., C0R0, C0R1, C0R2, and C0R3) and together form what the authors refer to as an "Assembly." For intercluster communication, packets are therefore sent over the optical network to the router in the Assembly that resides in the same cluster as the destination. From that point, packets are forwarded to the destination on the electrical mesh. Optical links connecting the nodes of an Assembly are implemented as R-SWMR buses. The formation of Assemblies decreases the number of nodes that form a crossbar and thereby reduces the total number of MRs. Electrical links are efficiently utilized for short distances. Concentrating four nodes at each router requires according bandwidth on the links to avoid early saturation. Their evaluation results show that the traffic pattern has a large impact on performance, with localized patterns being more benign.
ORNoC [59] deploys a similar topology as Firefly in the sense that cores are grouped in clusters and utilize an electrical mesh network for intracluster communication. Intercluster communication takes place through the optical network ORNoC, consisting of an optical ring on which an automatic wavelength assignment mechanism is performed for contention-free use of shared optical resources. Each cluster contains a gateway router through which all nodes of a cluster can access the optical NoC. Packets are routed to/from the gateway over the electrical intracluster mesh when a destination resides in a different cluster.
Lego [113] proposes to lower the optical bandwidth of optical links in a topology to reduce laser power while overcoming the latency drawbacks of the increased serialization by performing distance-based routing: short distances are covered using electrical links, and longer distances are reached using the low-bandwidth optical links. HOME [74] is a hierarchical NoC that utilizes a packet-switched electrical mesh network for intracluster communication and a circuit-switched optical network for intercluster communication. Four nodes are clustered at one "HOME" router through which both the electrical and optical networks are interfaced.
Phastlane [24] utilizes optical links not based on distance, but based on packet size. It combines a packet-switched mesh network with an optical, contention-free crossbar on which it transmits cache lines over several hops in one cycle. The optical crossbar utilizes a simple, predecoded source routing approach.

Combining Electrical Routers with Optical Links
A number of proposals use electrical links only to connect nodes to their input router, utilize optical links for global interconnects, and intermediate electrical routers to implement the routing.
The silicon-photonic Clos network (PClos) [47] uses point-to-point optical channels for lowenergy, long-distance data transmission. Both the routers and the links between the cores and routers are electrical-connections between routers are optical. Clos networks have high path diversity and show constant performance across all traffic patterns; however, each message has to pass through all router stages, which is inefficient for applications that leverage locality of cores or perform near data processing [5]. BLOCON [48] is a bufferless implementation of PClos that features a scheduling algorithm and path allocation scheme for managing routing in the Clos. It provides low latency and high throughput, but also has higher MR heating and laser power compared to PClos.
A similar approach as PClos is taken in PROPEL [76], which also utilizes optical links for data transmission between intermediate routers. In a mesh-like layout, each node can send and receive data to/from every other node in the same row and column, where a MWSR bus is dedicated to each node. If a destination does not reside in the same row or column as the sender, XY-routing is performed: the packet is first routed to the router in the same row that resides in the same column as the destination node, and is subsequently forwarded to the destination over the column link connecting the intermediate router and the destination. Four cores are clustered at each node, and 16λ bandwidth is provided between any two routers. Since PROPEL basically implements optical crossbars in each X and Y direction, scaling it up directly would lead to large optical resource requirements. Consequently, the authors propose E-PROPEL, a 256-node solution that clusters four 64-node PROPELs and connects them using optical crossbars in a way that a fat tree topology with multiple roots is created, which provides more efficient bandwidth scalability.
MPNOC [121] concentrates four cores at each router and implements four clusters of 64 nodes (or 16 routers). MPNOC utilizes a 3D approach where 16 decomposed optical crossbar slices are placed on a separate optical layer each to minimize the number of waveguide crossings. Each slice is thereby a 16×16 crossbar that connects all tiles from one cluster to another (intercluster communication) or all tiles from the same clusters (intracluster communication). Crossbars are composed of MWSR buses, one for each receiver.
The PHiCIT [100] proposal is obverse to the previous ones in the sense that it divides the NoC into equally sized clusters but utilizes a 2D electrical mesh for intercluster communication and optical crossbars for intracluster communication. The authors argue that intercluster communication has low activity during application execution; however, these few messages demand high throughput as they are required for main memory, task migration, or internal synchronization traffic. For high throughput, electrical links are cheaper in the sense that they provide higher throughput for much less data-independent power, which makes them more suitable for these types of traffic patterns. Clusters, on the other hand, are organized by computation complexity, communication requirements, and functional relationship of IP cores, leading to much higher traffic demands, making an optical crossbar more suitable, particularly for small cluster sizes. However, their study lacks a comparison to alternative hybrid NoCs, which makes it hard to assess the efficiency of this approach.

Discussion
Combining electrical and optical links is considered the natural evolution of electrical communication-not only due to the well-studied properties of electrical interconnects and their established manufacturing technology, but also because it allows one to outbalance the shortcomings of both technologies. Recent studies, as well as our analysis in Section 3, indicate that electrical links are the preferred technology for high-throughput, short-distance communication, whereas optical links provide higher energy efficiency and lower latency at longer distances with similar throughput levels. These attributes were leveraged either by electrical clustering, where a number of cores are grouped around a gateway router that enables access to the optical network, or by implementing an electrical network (mostly 2D mesh) on top of an optical network to perform distance-based routing, where the more efficient technology is used based on latency, throughput, and energy efficiency. Moreover, combining electrical and optical links takes load off the optical network and allows for lower MR counts and throughput requirements in the optical part of the network.
Although the discussed designs illustrate that optical links can indeed lead to higher power efficiency compared to electrical NoCs, there have been a number of overlooked aspects that hinder the proposed designs from becoming even more efficient. A number of approaches rely on clustering nodes around routers that enable the access to the optical network. While this helps to decrease static optical power by sharing resources, excessive clustering can lead to long link lengths and a challenging layout. For instance, for a clustering of eight (e.g., [47]), this might lead to electrical links considerably exceeding 1mm (for 1 to 2mm tile widths), which leads to large energy consumption (see Figure 12). Therefore, although it is an efficient way of decreasing the required optical resources in the NoC, clustering should be considered carefully with respect to tile and cluster sizes, particularly for future technology nodes where the energy and latency gap between transistors and electrical interconnects becomes wider.
Another popular approach is the combination of an electrical baseline network for shortdistance traffic, and on top of that an optical network for longer distances. Many studies successfully illustrate that this indeed leads to energy and latency savings compared to using electrical or optical networks only and successfully allows them to outbalance each other's drawbacks.
All in all, many studies were conducted with interesting results that suggest that with further device technology advancements, these NoCs could indeed be adopted for future designs; however, the design space is large and future challenges exist. In addition, on top of combining optical links with electrical links, they could also be combined with wireless on-chip data transmission, another promising paradigm currently evolving. To the best of our knowledge, this has only been addressed by Iris [65], which combines optical links, a dielectric antenna-array-based broadcast network, and a circuit-switched electrical mesh network.

ADAPTIVE LASER SOURCES AND CONTROL MECHANISMS
Although the previously discussed research efforts in ONoC architecture have led to designs efficiently utilizing optical on-chip resources, they still cannot fully diminish inefficiencies of static, data-independent laser power consumption. While this power overhead can be mitigated for communication-intensive workloads, it becomes more significant for applications that feature frequent periods of idleness. Unfortunately, this is the case for a number of application domains, such as scientific computing where compute-intensive execution phases underutilize the communication fabric [29] or in server computing-for example, Google-scale datacenters were shown to have typical utilization rates of 30% [6]. Switching the laser source on and off based on the current traffic demands would be ideal in these cases to improve power efficiency.
Off-chip laser sources are difficult to turn on/off as this would introduce large latency and energy overheads for going off-chip to perform the communication between the control circuit and the laser source. On-chip lasers, however, are close enough to be shut down and restarted quickly. In fact, DWDM-compatible, on-chip Ge-based lasers have been demonstrated that can be switched on/off within 1ns [68], thereby enabling efficient laser control. Apart from the energy required to implement the laser source control, the laser source only consumes power when turned on, which tremendously decreases power compared to the static, nonadaptive case. This section discusses recently proposed laser control mechanisms, which should ideally impose minimal overhead in terms of latency, energy, and area. Atac+ [56] not only saves power by controlling an adaptive laser source to switch on/off when needed but also adjusts its output power levels dynamically to the number of receivers on the SWMR bus. This allows for additional power savings in the unicast mode since not all receivers on the bus need to be driven in this case. A lightweight "select link" is implemented on which destinations are notified that they will receive data on the data link, similar to the R-SWMR bus. The laser source attached to the data bus is therefore switched off when idle and switched on with different power levels for either unicast or broadcast operation.
Although previous studies assume an off-chip laser source and thus evaluate their design with longer switching times [21], their approach can be applied to on-chip lasers too. Their control mechanism dynamically reconfigures the NoC based on network utilization. Cores are assigned to groups that are provided with optical bandwidth. They periodically send their average packet latency in fixed time intervals to a laser power controller, which then in turn switches laser sources on or off, based on the utilization of that router group. This mechanism allows one to identify phases of high communication in applications.
Another approach is to switch optical links on/off based on the L2 cache replacement rate metric, which is used to determine the required change in the L2 cache bank count, and hence the number of optical links to be switched on/off along with a dynamic activation/deactivation of L2 cache banks [20]. This allows one to scale the optical bandwidth based on the memory access patterns of an application.
EcoLaser [29] is a collection of static and dynamic laser control mechanisms for SWMR and MWSR crossbars. For a crossbar consisting of SWMR buses, each sender writes on its own SWMR bus and reads from other routers' buses. EcoLaser adds laser control circuitry to each sender backend router, and these only turn on the laser if there is a message at any of the injection buffers and do not turn it off unless all buffers are empty or the laser has stayed on for a predefined minimum laser stay-on time. The switch controller in the router backend has to wait until the laser is turned on, at which point the controller will notify the switch allocation, which will then forward the messages to the modulators. In an MWSR crossbar, each router reads from its own bus and writes on the other nodes' buses. In this case, each receiver holds the laser for its dedicated bus, which, however, complicates laser control as the receiver does not know what senders want to transmit. Therefore, a separate token-ring waveguide is utilized on which senders can forward "l aser turnon requests." EcoLaser+ [31] improves upon EcoLaser as it minimizes wasted energy by keeping the optical bus turned off during data transmission of small messages (i.e., cache coherence traffic). Moreover, performance benefits are achieved by turning the laser on proactively when possible. Compared to a perfect laser power controller that has full knowledge of future bus utilization, EcoLaser+ is only 2% to 6% less efficient, and saves up to 92% laser power compared to a static laser scheme.
However, crossbars consisting of SWMR or MWSR buses provide inefficient scalability due to a higher number of MRs, as identified in previous sections. Focus has thus been put on more practical network topologies with better scalability, namely, the flattened butterfly topology. SLaC [32] is a laser gating technique for this topology that turns off redundant paths in this topology to save energy while maintaining high performance and full connectivity-a property that cannot be provided in optical crossbars where laser gating disrupts the connectivity of the network. Their reported results indicate that, indeed, the laser turn-on latency can be removed, leading to minimal performance degradation.
A promising proposal is also to design a NoC specifically for the paradigm of partitioning computation and communication resources together to enable application concurrency [87]. A partitioning technology for WRONoCs is proposed, including an algorithm for online allocation of wavelengths aiming at maximum reuse across partitions, thereby powering off unused laser sources.
Leveraging on-chip semiconductor amplifiers (SOAs) combined with electro-optic comb switches has been proposed to achieve both traffic-independent and loss-aware savings in laser power in MWMR shared buses [105]. As discussed in Section 2.2, nodes on an MWMR bus first execute the bus arbitration phase, after which one node "owns" the bus. Once this phase is finished, the following steps are executed to ideally adapt the laser power to the current traffic pattern: Figure 24 illustrates the placement of the building blocks and their interaction. Prior to data transmission, the comb switch is turned off so that no laser power is drawn from the laser source. Prior to sending, the node owning the bus switches on the comb switch, which will then extract the minimum laser power required to drive a photodetector. The SOA amplifies the signal extracted by the comb switch based on the relative position of the sender and receiver, which determines the path losses for this particular sender-receiver pair. This information (i.e., the insertion losses to each receiver) is stored in a lookup table at each sender. Therefore, every sender controls the amount of amplification based on its receiver. This allows one to draw, at all times, the minimum required amount of laser power from the source based on the current IL max , leading to 31.5% laser power reduction with low latency overheads. One study proposed to extend the model of turning on-chip lasers on and off by adding two more states: "standby" and "intermediate" [57]. The former state operates the laser at a bias point slightly above its threshold current at which it cannot be used for data transfer, but allows for reduced turnon times and thus less performance degradation. The latter state dynamically operates the laser at just enough optical power for the desired communication bandwidth, thereby reducing energy overheads.
In addition, jointly tuning the on-chip lasers and MRs together to simultaneously align a wavelength along with the MR's resonant wavelength was shown to lead to better power consumption than alternative approaches without failing to meet bit error rate requirements [64].

Discussion
The ability to switch laser sources on and off based on the traffic demands, or adjusting the laser output power dynamically based on the path losses between current communicating senderreceiver pairs, clearly provides tremendous laser power savings (up to 92%). Reducing the latency and power overheads of the laser control mechanism is key to an efficient implementation and has been addressed in the discussed proposals; however, this is still a very young research field that offers designers many opportunities to investigate novel control techniques. Latency and energy required for laser control suggests that the availability of on-chip lasers will have a large impact on their efficiency as controlling an off-chip laser incurs high overheads.
Combining the proposed laser control techniques with the bandwidth sharing proposals of Section 4.3 is particularly intriguing. Most of these proposals assume a static laser source and propose efficient arbitration techniques for bandwidth assignment. While these designs were shown to be much more efficient than simple buses, their efficiency could be improved even further by including a dynamic laser control scheme. Latency is already imposed during arbitration; therefore, laser control could be performed simultaneously. For instance, in the R-SWMR bus (Figure 8), a sender first broadcasts a reservation flit on the control link to notify destinations to tune in their MR filters. Simultaneously, the laser source could be turned on on the data link to provide the light as soon as data transmission begins, and otherwise be turned off. Similarly, the control-networkbased WRONoCs (Section 4.2) could leverage dynamic power control too. Each source needs to check a destination prior to data transmission on the control network and wait for an according ACK. When sending out an ACK, the destination could, in parallel, notify the laser source for this particular channel to turn on when data transmission shall begin, thereby hiding latency related to laser control. Like that, only the laser sources providing the low-bandwidth control networks would have to be switched on constantly, while on the data network, where the vast majority of laser power is drawn, the laser sources could be switched on to certain destinations only when transmission is actually scheduled. These are merely two examples of the intriguing opportunities of combining existing ONoC architectures with adaptive laser control techniques.

AUTOMATIC DESIGN SYNTHESIS TOOLS FOR ONOCS
Laser power can be reduced tremendously by switching on/off laser sources and adjusting their output power to communication demands; however, it still heavily depends on IL max , which thus deserves separate consideration by the research community.
As discussed in Section 3, IL max depends on the loss parameters of current device technologies, which improve with technological advances and are not under the control of designers. Topology and layout choices, on the other hand, are. Since the field of ONoCs is still in its infancy, the layout and routing of waveguides has been executed based on the designers' intuition, with the goal of implementing feasible designs with few waveguide crossings and bendings, short waveguide lengths, and so forth in order to minimize IL max . Clearly, with increasing network sizes and complexities, this will not continue to be a reasonable approach. Automatic synthesis tools for ONoCs, just like in already established electrical VLSI, with an exhaustive exploration of the design space that explores the ideal routing and layout for a given topology to minimize IL max , would be a big step forward. A number of studies have been conducted that indicate the significance of such tools and the benefits they can provide: Ramini et al. [99] took first steps by providing a characterization and detailed quantification of interaction effects between the utilized technology platform, layout constraints, and network-level quality metrics for a number of different WRONoCs.
PROTON [12] is a tool for automatic placement and routing of ONoC topologies that supports designers by quantifying the degradation of quality metrics of a logical topology when implemented physically, as well as assessing their feasibility. PROTON places MRs with sufficient spacing and routes waveguides to minimize propagation and crossing losses. Their study compares handcrafted layouts with layouts produced by PROTON for simple WRONoC topologies, and results suggest that PROTON manages to reduce insertion losses by up to 150×-proving that automatic synthesis tools can provide tremendous benefits for the implementation of ONoCs. PROTON+ [9] is an extension of PROTON, dedicated for placement and routing in 3D ONoCs. It provides more flexibility than PROTON as, during placement and routing, it is possible to minimize propagation loss only, crossing loss only, or a combination of both. Moreover, it is technology independent as it allows one to update technology parameters and is capable of predicting their implications on topology quality metrics. It also provides a larger design space exploration, such as varying positions of memory controllers and different connectivity patterns. Evaluation results show that laser power can be reduced by up to 94%, and 8 × 8 topologies can be synthesized within 12 minutes.
Tala et al. [104] propose a methodology that builds upon the combination and aggregation of basic MR filter primitives and systematically synthesizes all points of the WRONoC topology design space. Formal methodologies were proposed that capture the dependencies between characteristics of the manufacturing process and attainable performance in WRONoCs, thereby closing the gap between technology providers and system-level designers [94]. Another study [88] focused on synthesizing optical ring topologies. Based on a specification, their algorithm produces MRs with minimal number of wavelengths and waveguides and produces a layout-aware power estimation to identify the most efficient design point.
While the previous proposals focus on synthesizing optical topologies based on their topology only, other studies introduce an optimizer that, rather than considering only minimizing IL max , offers designers a more accurate, cross-layer floorplan optimization by taking into account effects spanning optical and electrical boundaries, chip thermal profiles, and effects of job scheduling policies [120] [27]. The proposed optimizer minimizes NoC power, including E/O and O/E conversion power, MR heating power, and laser power due to insertion losses. Their study shows that a cross-layer approach changes the optimal floorplan significantly, with optimal waveguide lengths or thermal tuning changing up to 4× based on utilization levels and power of cores, aspect ratios of cores and clusters, and laser source sharing.

Discussion
Early studies and tool proposals dedicated to ONoC synthesis showed that IL max can be reduced tremendously compared to handcrafted designs. Although synthesis alone will not entirely solve the static power problem in ONoCs, the reported IL max reductions do lead to large laser power savings. Backend design tools that provide a simulation framework for the analysis and evaluation of SiP components are already available, such as PhoeniX [97], Lumerical [72], and VPIphotonic [110], and the development of a complete design flow for SiP is currently under way [73]. These tools, however, provide the physical design implementation based on the specification of the circuit topology. Obtaining this circuit topology from the front-end synthesis flow is still a challenge, although first steps have been taken by, for example, PROTON/PROTON+. Further development of these tools to minimize IL max , assess physical implications of layout on a detailed level, perform exhaustive design space exploration, shorten design time, provide guarantees on design feasibility, and support designers to cope with design complexity would allow for higherquality products while making the design and implementation process more efficient. Software tools that have the standard of electronic design automation tools are thus essential from both an economic and engineering point of view for designing circuits containing SiP devices.

CONCLUSION
Silicon photonics are widely considered one of the most disruptive technologies to allow for ongoing performance and power scaling in chip design, and its successful implementation would have a transformative impact on the way computers are designed-from rack-scale computing down to many-core chips. The article at hand provides an exhaustive survey on the research efforts that have been conducted in the realm of on-chip optical communication architectures, that is, optical Network-on-Chip designs. This research area has gained increasing attention from the community, and recent publications suggest that leveraging the high-bandwidth, low-energy transmission properties of optics efficiently comes closer to reality. Designing ONoCs is tightly tied to the underlying technological components and entails various design obstacles and a number of different challenges, from hybrid NoC design over wavelength-routed ONoCs to adaptive laser source control and automatic synthesis tools. We outlined current design challenges and future work for all crucial research areas of ONoCs, provided a detailed discussion on the state of the art, and identified key enabling technologies. In doing so, we believe that this article is of interest for a wide audience, from researchers who newly enter this field to designers and experts who can utilize this article as a reference/summary on the state of the art.